SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 49022: The Text Import node does not process more than 99,999 documents

DetailsAboutRate It

In the Text Import node of SAS® Text Miner, the maximum number of documents that might be converted into text documents in the destination directory folder is 99,999. If you have more than 99,999 documents in the "Import File Directory", then a portion of the documents is not correctly processed. The destination directory folder might not contain all of the first 99,999 converted documents.

There are not warning or error messages.

The following steps provide a work-around:

  1. Move the documents in the "Import File Directory" to multiple folders, say k folders, so that each folder contains approximately 1/k of the total number of documents. Make sure that each folder contains fewer than 99,999 documents.

  2. Create k Text Import nodes that import documents from their corresponding folders. Run each Text Import node.

  3. After all of the k Text Import nodes are completed, create a SAS Code node and connect it to each of those k Text Import nodes. Insert the following SAS code in the Code Editor. The code concatenates all of the exported data sets that are created by those k Text Import nodes into a new SAS data set. The new SAS data set is saved in a system folder that you define:

libname mylib "E:\tracks\TM" /* a folder where you have permission to write */ data mylib.combined; set EMWS<n>.TextImport_TRAIN EMWS<n>.TextImport2_TRAIN ... (more lines skipped) EMWS<n>.TextImportk_TRAIN ; /* where EMWS<n> is the ID value for the diagram */ run;

Submit the code and check the Log to make sure that there are no errors. Create a new data source for the "mylib.combined" data set.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Text MinerSolaris for x645.112.39.3 TS1M09.4 TS1M0
Linux for x645.112.39.3 TS1M09.4 TS1M0
HP-UX IPF5.112.39.3 TS1M09.4 TS1M0
64-bit Enabled Solaris5.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 2008 for x645.112.39.3 TS1M09.4 TS1M0
Microsoft Windows XP Professional5.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 2003 for x645.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 2003 Standard Edition5.112.39.3 TS1M09.4 TS1M0
64-bit Enabled AIX5.112.39.3 TS1M09.4 TS1M0
Windows 7 Ultimate x645.112.39.3 TS1M09.4 TS1M0
Windows 7 Ultimate 32 bit5.112.39.3 TS1M09.4 TS1M0
Windows 7 Professional x645.112.39.3 TS1M09.4 TS1M0
Windows 7 Professional 32 bit5.112.39.3 TS1M09.4 TS1M0
Windows 7 Home Premium x645.112.39.3 TS1M09.4 TS1M0
Windows 7 Home Premium 32 bit5.112.39.3 TS1M09.4 TS1M0
Windows 7 Enterprise x645.112.39.3 TS1M09.4 TS1M0
Windows 7 Enterprise 32 bit5.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 20085.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 2003 Enterprise Edition5.112.39.3 TS1M09.4 TS1M0
Microsoft Windows Server 2003 Datacenter Edition5.112.39.3 TS1M09.4 TS1M0
Microsoft® Windows® for x645.112.39.3 TS1M09.4 TS1M0
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.